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The two parameter Poisson-Dirichlet distribution PD(a,8) is the dis- 
tribution of an infinite dimensional random discrete probability. It is 
a generalization of Kingman's Poisson-Dirichlet distribution. The two 
parameter Dirichlet process IL a g UQ is the law of a pure atomic random 
measure with masses following the two parameter Poisson-Dirichlet dis- 
tribution. In this article we focus on the construction and the properties 
of the infinite dimensional symmetric diffusion processes with respective 
symmetric measures PD(a, 9) and Ii a Q UQ . The methods used come from 
the theory of Dirichlet forms. 



1 Introduction 

The Poisson-Dirichlet distribution PD{9) was introduced by Kingman in [11] to describe 
the distribution of gene frequencies in a large neutral population at a particular locus. The 
component Pk{0) represents the proportion of the fc-th most frequent allele. The Dirichlet 
process H$ !VQ first appeared in [6] in the context of Bayesian statistics. It is a pure atomic 
random measure with masses distributed according to PD{9). 

In the context of population genetics, both the Poisson-Dirichlet distribution and the 
Dirichlet process appear as approximations to the equilibrium behavior of certain large pop- 
ulations evolving under the influence of mutation and random genetic drift. To be precise, 
let Cfj(S) be the set of bounded, continuous functions on a locally compact, separable metric 
space S, A4i(S) denote the space of all probability measures on S equipped with the usual 
weak topology, and uq G Ai\(S). We consider the operator B of the form 

Bf(x) = °- f (f(y) - f(x))u (dy), f G C b (S). 



2 

Define 

V = {u : u(n) = / G C£°(R), <p G C h (S) if i G M^S)}, 
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where C£°(R) denotes the set of all bounded, infinitely differentiable functions on R. Then 
the Fleming- Viot process with neutral parent independent mutation or the labeled infinitely- 
many-neutral-alleles model is a pure atomic measure-valued Markov process with generator 

Lu(fi) = (SV«M(.),*i> + f(( ^ )) ^,^ > „«£ V, 

where 

Vu(fj,)(x) = 5u(/i) j 5 fx(x) = lini^e -1 {u((l — e)/i + e5 x ) — u(/i)}, 

and 5 X stands for the Dirac measure at x e S. For compact space S and diffusive probability 
uo, i.e., vq(x) = for every x in S, it is known (cf. [2]) that the labeled infinitely-many- 
neutral-alleles model is reversible with reversible measure Ue tVQ . 

Introduce a map $ from A4i(S) to the infinite dimensional ordered simplex 



oo 



{(xi, x 2 , ■ ■ •) : xi > x 2 > ■ ■ ■ > 0, x i = 1} 



i=i 



so that is the ordered masses of /i. Then the labeled infinitely- many-neutral-alleles 

model is mapped through $ to another symmetric diffusion process, called the unlabeled 
infinitely-many-neutral-alleles model, on Voo with generator 



\y:^(s ij -x j )-^-y:ex i — , (i.i) 




defined on an appropriate domain. The symmetric measure of this process is PD{6). 

For any < a < 1 and 6 > —a, let Uk, k — 1, 2, be a sequence of independent random 
variables such that Uk has Beta(l — a, 9 + fcct) distribution. Set 

^ = fA, C* 9 = (1 - tfi) • • • (1 - U n -!)U n , n>2, 

and let P(a,9) = (pi,p2,---) denote (V"' , V^' , ...) in descending order. The distribution of 
(Vi"' , V^' 6 *, . . .) is called the two parameter GEM distribution. The law of P(a, 9) is called the 
two parameter Poisson-Dirichlet distribution, denoted by PD(a,9). For a locally compact, 
separable metric space S, and a sequence of i.i.d. S- valued random variables k = 1,2, ... 
with common diffusive distribution z/ on S, let 

oo 

fc=i 
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The distribution of S Qj )I/o , denoted by Dirichlet(0, a, z/ ) or H a ,e,u > is called the two-parameter 
Dirichlet process. Clearly PD{0) and He,v correspond to a = in PD(a,0) and H a> e, vo , 
respectively. 

As was indicated in [18] and the references therein, the two parameter Poisson-Dirichlet 
distribution and Dirichlet process are natural generalizations of their one parameter coun- 
terparts and possess many similar structures including the urn construction, GEM represen- 
tation, sampling formula, etc. The cases of = 0, a are associated with distributions of the 
lengths of excursions of Bessel processes and Bessel bridge, respectively. It is thus natural 
to investigate the two parameter generalizations of the labeled and unlabeled infinitely- 
many-neutral-alleles models. One would hope that these dynamical models will enhance our 
understanding of the two parameter distributions. 

Several papers have appeared recently discussing the stochastic dynamics associated with 
the two parameter distributions. A symmetric diffusion process appears in [5], where the 
symmetric measure is the GEM distribution. An infinite dimensional diffusion process is 
constructed in [16] generalizing the unlabeled infinitely-many-neutral-alleles model to the 
two-parameter setting. In [1], PD(a,0) is shown to be the unique reversible measure of a 
continuous time Markov chain constructed through an exchangeable fragmentation coales- 
cence process. But it is still an open problem to construct the two parameter measure-valued 
process generalizing the Fleming- Viot process with parent independent mutation. 

In this article, we will consider two diffusion processes that are analogous to the un- 
labeled and labeled infinitely-many- neutral- alleles models. In Section 2, an unlabeled two 
parameter infinitely-many-neutral-alleles diffusion model is constructed via the classical gra- 
dient Dirichlet forms. This process is shown to coincide with the process constructed in [16]. 
Besides establishing the existence and uniqueness of the process, we also obtain results on 
the sample path properties, the large deviations for occupation time process, and the model 
with interactive selection. The construction of the labeled infinitely-many-neutral-alleles dif- 
fusion model turns out to be much harder. Here the evolution of the system involves both 
the masses and the locations. Note that the one parameter model with finite many types is 
the Wright-Fisher diffusion, and the partition property of the infinite type model makes it 
possible for the finite dimensional approximation. However, in the two parameter setting, 
the finite type model itself is already a challenging problem not to mention the loss of the 
partition property. In Section 3, we construct a general bilinear form that, if closable, will 
generate the needed diffusion process. If the type space contains only two elements or the 
type space is general but a — — k, and = inn for some k > and integer m > 2, then the 
above bilinear form is closable and a symmetric diffusion can be constructed accordingly. 
The closability problem in the general case boils down to the establishment of boundedness 



3 



of a linear functional. An auxiliary result is enclosed at the end of the article to demonstrate 
the difficulty involved in establishing the boundedness. If the bilinear form is indeed not 
closable, then its relaxation may be considered. 

2 Unlabeled Model 

Let 

oo 

Voo := {x = (xi, x 2 , ■ ■ ■) : xi > x 2 > ■ ■ ■ > 0, ^ x { < 1} 

i=i 

be the closure of Voo in the product space [0, 1]°°. For < a < 1 and 9 > —a, we 
extend the two parameter Poisson-Dirichlet distribution PD(a,9) from Voo to Voo- To 
simplify notation, we still use PD(a,9) to denote this extended distribution. Let a(x) be 
the infinite matrix whose (i, j)-th entry is Xi(Sij — Xj). Denote by V the algebra generated 
by 1, ip2, v?3j • • • ) Vm, ■ ■ ■, where (p m (x) = YJiLi %T ■ We consider the bilinear form A of the 
form 

A(u,v) = - _ (Vu,a(x)Vv)dPD(a,9), 
2 JVoo 

Theorem 2.1 The symmetric bilinear form (A, V) is closable on L 2 (Voo; PD(a, 9)) and its 
closure (A,D(A)) is a regular Dirichlet form. 

Proof Define 

_ 1 f ~ J¥_ _ ~ d 2 

The case of a = corresponds to A defined in (1.1) 

A°(uv) = A°u ■ v + A°v ■ u + (Vu, a(x)Vv). 

Hence 

A(uv) = Au ■ v + Av ■ u + (Vu, a(x)Vv). (2.2) 

We claim that 

/ AudPD(a,9) = 0, VueV. (2.3) 



^2(9xi + a) 



i=i 



_d_ 

dxi 



(2.1) 



. One finds that for any u,v EV, 
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In fact, let m u . . . , m k e {2, 3, . . .} and k > 1. Then we obtain by (2.1), (1.1) and [4, (2.13)] 
that 



A((fi mi ■ ■ ■ (p mk ) = Yl 



i=l 



m,i \ rriia 

. 2 ] ~ 



k 

E 

i=i 



rrii \ rrij 
.2 + ^ 



+ E 



mirrij 



K'j 
k 

i=l 



(2.4) 



= E 



i=l 



ITLi \ ITLia 

. 2 J 



<Pmi-l II Prnj + E 



--m(m- l + 9)]Jip mi . 

Denote by {ni,n 2 , . . . ,ni} an arbitrary partition of {mi,m 2 , . . . , m^}. That is, each = 
+ • — h for some distinct indexes i±, . . . , i^, and {1, 2, . . . , k} — U l i=1 {ii, . . . , i^}. By 
Ewens- Pit man's sampling formula, we get 



[_ A(<p mi ...<p mk )dPD(a : 9) 



E {Y,-K-( n i - 1 - a ) 



ni,n 2 ,...,n; U=l 



:-!)(-! -i) •••(-l-^+i) 

0(0 + 1) •••(0 + 771-2) 



JJ(— a) • • ■ (—a + rij — 1) (—a) • • • (—a + rij — 2) 
"2 m(m " 1 + g) 9(9 + 1). ..(9 + m-l) 



-a 



-a + rij — 1) 



0, 



where the value of the right hand side is obtained by continuity when a = or 9 = 0. 
Similarly, by Ewens- Pit man's sampling formula, we can further check that 



[_ (Au)vdPD(a,9) = [_ (Av)udPD(a,9), Vu,veV. 
By (2.2), (2.3) and (2.5), we get 

A(u,v) = - [_ (Au)vdPD(a,9), \/u,v e V. 



(2.5) 



Hence the symmetric bilinear form (A,V) is closable on L 2 (Voo; PD (a, 9)) by ([13, Propo- 
sition 3.3]). The closure (A,D(A)) of (A,V) is a symmetric closed form. To prove that 
(A,D(A)) is a regular Dirichlet form, it is enough to show that (A,D(A)) is a Markovian 
form. To this end, we will show that (A, D(A)) is the same as the closure (A, B) of {A, B) 
with 

B := {u E L 2 (Voo; PD(a, 9)) : u = f o n k for some k,fe C °°(R k )}, 

where n k : Voo — > R- k , (^i, . . . , x^, . . .) — > (xi, . . . , x^). Note that (A, B) is clearly Markovian 
(cf. [7, Page 4]) and this property is preserved by its closure (cf. [7, Theorem 3.1.1]). 

Let m > 2. Then one can show that tp m G B by considering the approximation sequence 
{(fm(x) := Y<iLi x T}ngn- Thus V C B. To show that B C D(A), we need to show that any 
finite-dimensional smooth function of the coordinates Xi,x 2 , ■ ■ ■ belongs to D(A). This can 
be done by polynomial approximation and noting the fact that 

x 1= hrn^) 1 /™, x 2 = lim(cp m -xT) 1/m ,..., 
where the convergence takes place pointwise on Voo- 

□ 

It is worth noting that PD(a, 9) is the unique probability measure on Voo such that (2.3) 
is satisfied. In fact, suppose that ji G A / (i(V 00 ) satisfying 



l_ Audfi = 0, V«G V. 



Note that for any m > 2 

Aip n 



ma 

A tp m —ipm-l 





ma 




( m 


\ m9 






2 


<Pm-l - 




) + ~ 


<-Pm 



The fact of Jy A(p m dfi = implies that 

r(m - a)T(9 + 1) 
T(l-a)T(6 + m) 
Tit 



_ ifmd/J, 



i) r 1 

1 — a) Jo 



(1-u 



a+l 



V{9 + a)r(l - a) 
Then, we obtain by [18, (6)] that 

/_ (fimd/j, — _ (p m dPD(a,9), Vm G N. 

7Voo ^Voo 



(2.6) 
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Furthermore, we obtain by (2.4), (2.6) and induction that 

/ udfi= [_ udPD{a,6), Vw G V . 

Since V is measure-determining, /z = PD(a,9). 

By the theory of Dirichlet forms, there exists an essentially unique Hunt process (X, (Px) xe y 00 ) 
on Vqo with the stationary distribution PD(a, 9) such that X is associated with the Dirich- 
let form (A,D(A)) (cf. [7, Chapter 7]). Note that Al = 0. By [20, Proposition 2.3], one 
finds that X is a conservative diffusion process. Denote PpD(a,e)(-) = ly^ Px(-)PP ) (ct, 9){dx). 
Then we have the following proposition. 

Proposition 2.2 The process X with initial distribution PD(a,9) never leaves Vqo, i.e., 

PpD(a,8) (X t G Voo,Vt>0) = l. (2.7) 
In addition, the process X is ergodic, i.e., 



lim 

t^oo 



Ttf- L fdPD(a,< 



= 0, V/GL 2 (V 00 ;PD(a^)) ) (2.8) 

L 2 (V 0O ;PD(a,(9)) 



where {T t )t>o denotes the semigroup associated with (A,D(A)) on L 2 (Voo; PD(a, 9)). 

Proof We first prove (2.7) by approximation. For N G N, denote ipi(x) := X^Li^i, 
x G Voo- Then liniAr^oo \\ip^ — V^i II l 2 (Voo -P-D(« e)) = 0- For N > M, we have that 

1 N r 

Aitf-tfttf-d 1 ) < - £ L x t PD(a,9)(dx) 

A i=M+l JVo ° 

-> asiV.M^oo. 

Thus {^i^jvsn is an ^4-Cauchy sequence such that (p^ converges to ipi in L 2 (Voo; PD(a, 9)) 
as N — > oo. By [7, Lemma 5.1.2], one finds that for any T > 0, 

PpD(a,e) y]Xj(t) converges uniformly on [0,T] as X — > oo = 1. 

Then PpD(a,e)(t ~^ YnZi Xi(t) is continuous) = 1. Since for any fixed i, -PpD(a,0)(Z^i -X(t) = 
1) = P£>(a:,0){E*=i^ = 1} = 1, (2.7) holds. 

Next we turn to the proof of the ergodicity. In fact, it is enough to verify (2.8) by 
considering the following family of functions 

T ■= Wmi ■ ■ ■ Vm k ■ m u ■ ■ ■ , m k G {2, 3, . . .}, k > 1}. 
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Let / e T. By (2.4), there exists a constant A > and g E T with degree(g) < degree(/) 
such that A/ = -\f + Then 

T t f = e- xt f + e- A * f e Xs T s gds, V* > 0. (2.9) 

JO 

Taking integration on both sides of (2.9), we obtain by the symmetry of (T t ) t > that 

[_ fdPD(a,9) = e- xt [_ fdPD(a,9) + e- xt [ t e Xs ([_ gdPD(a,9)) ds. (2.10) 
Subtracting (2.10) from (2.9), we get 

Ttf - L fdPD(a,< 



< e 



L 2 (Voo;PD(q,6»)) 



f - L fdPD(a,9) 
+e- A * f e Xs T s g - [_ gdPD(a, 9) 

JO JVoo 



ds. 



L 2 (Voo;PD(a,6»)) 

Then we can establish (2.8) by using induction on the degree of /. 



□ 



Remark 2.3 The unlabeled two parameter infinitely-many-neutral- alleles diffusion model 
considered in this section is directly motivated by [16]. In [16], Petrov used up/down Markov 
chains and an approximation method to construct the model. In this section, we use the 
theory of Dirichlet forms to give a completely different construction. Our construction might 
be more direct and simpler. More importantly, our observation that the model is given by the 
classical gradient Dirichlet form enables us to use this powerful analytic tool to generalize 
various basic properties of the infinitely-many-neutral- alleles diffusion model from the one 
parameter setting to the two parameter setting. The Dirichlet form constructed here differs 
from the Dirichlet form associated with the GEM process in [5] even on symmetric functions. 

There are many problems about the unlabeled two parameter infinitely-many-neutral- 
alleles diffusion model which deserve further investigation. As applications of Theorem 2.1, 
we present below several properties of the model via Dirichlet forms, including a sample path 
property, a result on large deviations, and the construction of models with selection. 

Theorem 2.4 Let X be the unlabeled two parameter infinitely-many-neutral- alleles diffusion 
model and let k > 1. Denote A k := Vqo H {Y^=i x i — 1} an d D k := Vqo fl {Yli=i x i — 
l}n{x k > 0}. 
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(i) If 9 + ak < 1, then any subset of the (k — 1) -dimensional simplex A k with non-zero 
(k — 1) -dimensional Lebesgue measure is hit by X with positive probability. 

(ii) If 9 + ak > 1, then D k is not hit by X . 

Proof We will establish (i) and (ii) by generalizing [19, Propositions 2 and 3] to the two 
parameter setting. The results are based on Fukushima's classical result (cf. [7, Theorem 
4.2.1]), which says that a Borel set B is hit by X if and only if B has non-zero capacity. We 
use Cap(-B) to denote the capacity of a Borel set B (cf. [7, Chapter 2]). Recall that 

Cap(5) = inf Cap(A) 

A is open 

and 

Cap(A) = inf ^A(u, u) + j_ u 2 dPD(a, 9) : u E D(A),u > 1 on A, PD(a, 9) - a.e.j 
if A is an open set. 

For k = 1, let ui denote the Dirac measure at (1,0,0,...). For k > 2, let S^-i : = 
{x E H k l : X\ > • ■ • > Xk-i > 0,X)i=i 1; £i ^ 1} be equipped with [k — l)-dimensional 
Lebesgue measure and let uj, denote the measure induced by the map £ : S^-i — > A^, 
£(xi, . . . , Xk-i) = (xi, . . . , Xk-i-, 1 — Yh=i x %i 0, 0, . . .). In order to show that A k has non-zero 
capacity if 6 + ak < 1, it is enough to show that there is a dimension-independent constant 
c > such that 

udu k ^ 2 < c (A(u,u) + J_ u 2 dPD{a,9)y Mu E D{A) fl C(Voo)- (2.11) 

For n > k, denote B k — S n f\ {X^Li %i — 1} and use fj, n , v kn to denote respectively the image 
measures of PD(a,9), v k under the projection of Voo onto the first n coordinates. Then 
(2.11) is equivalent to 

(J Bk f d ^n) 2 <cj s Q(V/,aV/) + / 2 )^„, V/GC °°(R n ). (2.12) 
To prove (2.12), we will make use of a new coordinate system. Denote 

k 
i=l 

and 

S' n _ k = {x E R n k : Xi > ■ ■ ■ > x n _ k >0,(k + l) Xl + x 2 + ■ ■ ■ + x n - k < 1}. 
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Consider the map : S n n (0 < r < 1) — > Sfc_i x (0, 1) x S^_ fe , 

0(xi,...,X n ) = (lii, . . . , U k -l, Uk,U k +l, ■ ■ ■ , u n ) 

' X\ — Xk+1 Xk-1 ~ Xk+l x k+l X n 

i ■ ■ ■ i i ' i j • • • j 

r r r r 

is a one-to-one onto map with the inverse 

xi = (1 - u k )u! + u k u k +i, 



X k -1 


= (l-u k )u k . 


-1 + u k u k+1 , 


x k 


= (l-«fc)(l 


— (u\ H Vu k 


Xk+1 


= U k+ lU k , 




Xn 


= u n u k . 





(2.13) 

One can check that the Jocobian of is (1 — iik) k ~ 1 u k l ~ k . 

Denote by h the density function of /i n with respect to n-dimensional Lebesgue measure. 
By [9, Theorem 5.5], we have that 

n / n s.e+an-1 /l— Z)" x '\ 
h(x U ...,X n ) = C n ^ 6 [J Xj (a+1) 1 - X i) Pafi+an ! = L " L , (2. 14) 

j=i V i=i J \ x n j 

where 

_ " r(fl + l + (i-l)a) _j0 n , a = 0, 

~ l\ r(i - ano + ia)-\ < « < 1 

and p a fi+an is a two parameter version of Dickman's function, i.e., 

^ +m ( S )=P« +m <l), 5>0. 

Note that (1 — X^Li x i)/x n is only a function of ujt+i, . . . , u n by (2.13). Hence we obtain by 
(2.14) that the joint density of (ui, . . . , u n ) under /i n o _1 is given by 

h(u h ...,u n ) = VK+i, ■ ■ ■ , u n )( Xl ■ ■ ■ x k y {a+1 \l - Ukf^u 6 ^- 1 

for some function tp. Note that the product x± • • • x k is only a function of ui, . . . , u k+ i. Hence 
the conditional density satisfies 

, , (l-Uk) k - 1 u e k +ak ' 1 (x 1 --.x k )-^ 

h{ui, ■ ■ ■,u k \u k+1 , ...,u n ) - 



Jo ■ ■ -/o(l - Wfc)*- 1 ^ * -1 ^! " • •x fc )-(°+D^ 1 • . - d Ufc - 
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Therefore, there exists a constant c\ > 0, which depends on a, 6, k and e > but not n, 
such that on {u k+ i > £} we have 



fc-l„.0+afc-l 



h(Ui, . . . , Mfc|Ufc+i, . . . , u n ) > ci(l - u fc ) u fc 

We now prove (2.12). Without loss of generality, we assume that / vanishes for r(= 
u k) > 1/2. In fact, if this condition is not satisfied, we may obtain (2.12) by multiplying / 
by a finite-dimensional smooth function 7 G B, which is equal to 1 when r = and vanishes 
for r > 1/2. Denote by a(du k+ i, ■ ■ ■ , du n ) the distribution of Uk+i, ■ ■ ■ ,u n under ji n o 
We choose e > such that p := o r (-u fc+1 > e) > 0. Define A = S k ^i x (0, 1) x S' n _ k and 
A £ = Sk-i x (0, 1) x [S' n _ k fl (u k+ i > e)\. To simplify notation, we denote by / the integral 
on the left hand side of (2.12). Then 

/ / ( «1, «2, • • • , Wfc-l, 1 - V Mi, 0, . . . , • • • rfw fe _i 

- / / 9 fc /((l - « fc )«i + UfcUfc+i, • • • , UnUkjdtn ■ ■ ■ du k _ x du k 
Jo Js k -- L 

d k {f ° _1 (w))<iwi . . . du k a(du k+ i ■ ■ ■ du n ) 



1 

P J A, 



< - 



u k [d k (f o ^it))] 2 "** * ^du\ . . . du k a(du k+1 ■ ■ ■ du n ) 
-(e+ak) N 1/2 



1/2 



< 



d^i . . . du k a(du k+ i ■ ■ ■ du n ) ) 
C (^J^u k [d k (f o _1 (w))] 2 /i( Wl , . . .,u fc |w fc+1 , . . . ,M n )d«i . ..du k a(du k+1 



dU T 



1/2 



< c 



1/2 



(v/(r 1 H), a(r 1 H)V/(0- 1 H))/i„ o ^(du) 

= c7 (Vf,a(x)Vf)vn(dx) 
= CA(fJ), 

which proves (2.12). Here C denotes a generic constant whose value may change from line 
to line but independent of n. For the last inequality we have used the following estimate 

u k [d k (f o 0~V))] 2 < C(V/(0" 1 H), a(0- 1 H)V/(0~ 1 H)) for u k < \, 



which is given by [19, Lemma 3]. 
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We now establish (ii). For k — 1, by (2.14), we have that 

h{x 1 )<cx^ a+1 \l-x 1 ) e+a - 1 

for some constant c > 0. For n > 1, we choose g n G C°°(R) satisfying g n (x) = if x < n, 
9n(x) — 1 if x > 2n, and < g n (x) < 1 for all x G R. Also, we require that <4(x) < 2/n for 
all x G R. Set u n = g n o ln((l — Xi)" 1 ). Then, if 9 + a > 1, we have that 

Cap(Ai) < A(u n ,u n )+ [_ u 2 n dPD(a,9) 

r pi— exp (— 2n) 
Z J 1— cxp (— n) 



exp (— n) 

■xi {a+1 \l - x 1 ) e+a - 1 dx 1 + PD(a, 9){ Xl > (1 - exp (-n))} 

2 1+a C /-l-exp(-2n) 

< — =- / (1 - xi)* +a - 2 cZa;i + PD(a, 9){x 1 > (1 - exp (-n))} 

— > asn^oo. 

For > 2, we fix an e > 0. Choose w G C°°(R) satisfying u>(x) = if x < e and w(x) = 1 
if x > 2s. Let s = J2i=i Xi and define u n = g n o ln((l — s)~ r ). Set v n (x) = u n (x)w(xk). Note 
that v n — 1 on an open subset containing (s = 1) n (x^ > 2e) and v n vanishes if x^ < e. For 
a large n, the support of t> ra is contained in the set (1 — s)^ 1 < 1. Moreover, we obtain by 
(2.14) that there exists a constant C(e,a,9) > such that 

h(x 1 ,...,x k ) < C(e,a,0)(l - s)^- 1 on the support of v n . (2.15) 

Since 

V(u n w) = wVu n + UnVlV, 

we get 

A(v n ,v n )<[_ w 2 (Vu n ,aVu n )dPD(a,9)+ [_ u 2 n (w,aVw)dPD(a,9). (2.16) 

Similar to the k — 1 case, we can use (2.15) to show that the first term of the right hand 
side of (2.16) tends to as n — > oo if 9 + aA; > 1. Since u^iw^Vw) — > as n — > oo, 
PD(a : 9)-a,.e. : we conclude that Cap((s = 1) fl (x^ > 2^)) = 0. Since e > is arbitrary, 
Cap(Dfc) = 0. The proof is complete. □ 

Remark 2.5 In [19], Schmuland showed that, in the one parameter model, A k is hit by X if 
and only if 9 < 1. The phase transition is between infinite and any finite alleles and occurs 
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at 6 = 1. In the two parameter model, our Theorem 2.4 shows that the phase transition 
is between infinite and certain finite alleles (number of alleles is no more than k c = l^f-])- 
The maximum number of finite alleles can be hit is [^] corresponding to 9 = 0. So the 
number of alleles is either infinity or less than or equal to [^]. This creates a barrier between 
finite alleles and infinite alleles. The results indicate an essential difference between the one 
parameter model and the two parameter model, which deserves a better explanation in terms 
of coalescent. 

We denote by (L, D(L)) the generator of the Dirichlet form (A, D(A)) (cf. Theorem 2.1) 
on L 2 (y oo ]PD(a,6)). Note that Lu = Au for all u e V, where A is defined in (2.1). For 
m > 2, define \ m = m(m — 1 + 6)/2 and denote by n(m) the number of partitions of the 
integer m. 

Proposition 2.6 The spectrum of (L, D(L)) consists of the eigenvalues {0, — A2, —A3, . . .}. 
is a simple eigenvalue and for each m > 2, the multiplicity of —\ m is n(m) — n(m — 1). 

Proof The spectrum characterization has been obtained in [16] using the up/down Markov 
chains and approximation. However, a bit more transparent derivation can be given using our 
(2.4). Note that (2.4) is a consequence of Pitman's sampling formula and already indicates 
the structure of the spectrum of (L,D(L)). With [3, (1.4)] replaced with our (2.4), Propo- 
sition 2.6 then follows from an argument similar to that used in the proof of [3, Theorem 
2.3]. 

□ 

We now present a result on the large deviations for occupation time process. It shows 
that the Dirichlet form (A,D(A)) appears naturally as the function governing the large 
deviations. Define 

L t (C) :=- I l c (X s )ds, VC G BtVoo), 
t Jo 

where £>(Voo) denotes the Borel a-algebra of Voo- We equip A / fi(V 00 ) with the r-topology, 
which is generated by open sets of the form 

U(v;e,F) := j/x e Mi(Voc) 

where e > 0, v E -M(Voo) and F e -B b (Voo), the set of bounded Borel measurable functions 
on Voo- The next result follows from [15, Theorems 1 and 2]. 



f Fdp - J 



Fdv 



< e 
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Proposition 2.7 Let U be a r-open subset and K be a t -compact subset o/A / fi(V 00 ). Then 
for A-q.e. x G we have that 

liminf- log P x [L t eU]> - M{A(u,u)\u G D(A),u 2 PD(a,6) G U} 



and 



inf { sup lim sup - log P x [L t G K] 



t^oo t 



N C Voo, N is A — exceptional 
< - mi{A(u,u)\u G D(A),u 2 PD(a,6) G K}. 



Finally, we would like to point out that the infinitely-many-neutral-alleles diffusion model 
considered in this section can be easily extended to include interactive selection. 

Proposition 2.8 Let p G L 2 ( Voo ; PD (a, 9)) satisfying p 2 > e > 0, PD(a,6)-a.e., or ip G 
D(A) and p> 0, PD(a,9)-a.e. Then the perturbed bilinear form 

A v (u,v) = - [_ (Vu,a(x)Vv)p 2 dPD( y a,9), u,v G V 

is closable on L 2 (Voo] p 2 PD(a, 9)) and its closure (A p , D(A P )) is a regular local Dirichlet 
form. 

Proof First, we consider the case that p 2 > e > 0, PD(a,9)-&.e. Let {u n G PjneN be a 
sequence satisfying u n — > in L 2 (Voo; p 2 PD(a, 9)) as n — > oo and A p (u n — u m , u n — u m ) — > 
as n,m ^ oo. Then the strict positivity of p 2 implies that {w n }neN is an ^4.-Cauchy sequence 
and u n — > in L 2 (Voo; PD(a, 9)). Hence the closability of (.4, P) implies that 



lim / (Vu n , Vu n )dPD(a,9) = 0. 



Thus limn^oo A p (u n , u n ) = by Fatou's lemma. Therefore (A P ,V) is closable. 

Now we consider the case that p G D(A) and p > 0, PD(a,9)-a.e. Let (X, PpD{a,6)) be 
the Markov process associated with the Dirichlet form (A,D(A)). Since p G -D(-4), it has 
a quasi-continuous version (cf. [7, Theorem 2.1.7]), which is denoted by p. For n G N, we 
define r n := inf{t > : p(X t ) < 1/n} and r := lim n „ >00 T n . On {t < r}, we define 

Mf pl ;=Mt [ln{pV{1/n))] , if t<r n , 

where M t ^ denotes the martingale part of the Fukushima decomposition of the additive 
functional fj{X t ) — fj(X ) if 77 G D(A) (cf. [7, Theorem 5.2.2]). We denote by X p the Girsanov 
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transform of X with the multiplicative functional lJf' := exp(M f ' ln ^ — ^(M^ hl ^) t )l t<T , where 
(•) denotes the quadratic variation of a martingale. Then X p is associated with a Dirichlet 
form on L^V*,; p 2 PD(a, 9)) that extends (A P ,V). Therefore (A P ,V) is closable. It is easy 
to check that its closure (A p , D(A P )) is a regular local Dirichlet form. The proof is complete. 

□ 



3 Labeled Model 

In this section, we will construct measure- valued processes associated with the two-parameter 
Dirichlet process through the study of a general bilinear from. We are successful in two 
particular cases (cf. Theorems 3.2 and 3.5 below). 

Let S be a locally compact, separable metric space and E := AAi(S) be the space of 
probability measures on the Borel a-algebra B(S) in S. Following (1.2), the two parameter 
Dirichlet process H a ,o,v satisfies 

n a ,^) = p (|>% e a) 

for any A E B(E), the Borel a-algebra of E. We denote by E P the expectation with respect 
to P. Set 

T := Span{(/ 1 , //)••• (f k , fj) : f u . . . , f k E C b (S), k E N}. 
Consider the following symmetric bilinear form 

£(u,v) = \jjy u (v), ^v(ii))^U aAuo (dii), u.vET. (3.1) 

Recall that V«(/i) is the function 

du , , _ u((l-e)fi + e5 x )-u(fi) 

d/JL(x) ^ ~ 6 

and (f,g)^ := J fgdfi- {J fdfj,)(J gdfi). Note that T(u,v) := (Vm(/i), Vv(/i)) M is a square 
field operator. If {£,J-) is closable on L 2 (E; H a ,0,v o ), then following the argument of ([21, 
Lemma 7.5 and Proposition 5.11]), one can show that the closure (£,D(£)) of (£,3^) is a 
quasi-regular local Dirichlet form. Hence, there exists an essentially unique diffusion process 
X which is associated with (£,D(£)) (cf. [13, Theorems IV. 6. 4 and V.l.ll]). This diffusion 
process is called the labeled two parameter infinitely-many-neutral-alleles diffusion model. 
However, quite different from the unlabeled case, we find that the closability problem of 
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(SjJ 7 ) is challenging. To understand this point, let us consider the case that the type space 
S is finite. This is equivalent to projecting every /j, in Mi(S) to {//( J,) : % — 1, 2, . . . , n} for 
certain finite partition { Jj : i = 1, 2, . . . , n} of space 5. 

Let {o~(t) : t > 0, a = 0} be a subordinator with Levy measure x~( 1+a )e _:c da;, x > 0, and 
{t(£) : t > 0, 70 = 0} be a gamma subordinator that is independent of {a t : t > 0, cr = 0} 
and has Levy measure x~ 1 e~ x dx, x > 0. The next result follows from [18, Proposition 21] 
and the construction outlined on [17, Page 254]. 

Proposition 3.1 (Pitman and Yor) Let 

g7(|) 



r(i - a)' 

For eac/i n > 1, and eac/i partition Ji : i = 1, . . . ,n of S , let 

(ii = vo{Ji), i = \,...,n, 

and 

Z afi (t)=a( 1 (a,9)t), t>0. 
Then the distribution of i^ a ,6,v {Ji ),■■■, ^a,9,u (Jn)) ^ the same as the distribution of 

( Z a #(ai) ^ a ,e(E"=i %) - Z afi (YJjZl aj) \ 

V^(i)'"'' z afi {\) )■ 

In general, the distribution function of (S aj e jl/0 ( Ji), 5 aj e jl/0 ( J n )) cannot be explicitly 
identified. The exception is the case that | jS' j = 2, i.e., S contains only two elements. 

Theorem 3.2 Suppose that \S\ = 2. Then {£,F) is closable on L 2 (E;U a! g :l/{) ). Moreover, 
its closure (£,D(£)) is a regular local Dirichlet form, which is associated with a diffusion 
process on E. 

Proof We assume without loss of generality that < a < 1 and 9 > —a. It is enough to 
show that (£,J r ) is closable on L 2 (E; ILj,^). Once this is established, the proof of the last 
assertion of the theorem is easy. Set S = {1, 2}, E = [0, 1] and p — l—p — z/ (l). Denote by 
dx the Lebesgue measure on [0, 1]. Then T is the set of all polynomials restricted to [0, 1] 
and 

1 r 1 

£(u,v) = - x(l - x)u'{x)v'(x)Tl a fi, V0 {dx), u,v G T. 
First, we consider the case that = 0. It is known (cf. [12]) that 

n Qi0 ,„ (cfo) = q afi {x)dx 
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with 

a n ( x ) — t-L. 5: L l L o < x < 1 

' 1 y nlp 2 x 2a + p 2 (l-x) 2a + 2ppx a (l-x) a cos(air)Y ~ ~ 

Define 

Lu(x) = 2 X (^ ~~ x ^ u ~^ u '( x ) [(-^ — ^ x ) 

2p 2 x 2a (l -x)- 2p 2 (l - x) 2a x + 2pp{\ - 2x)x a (l - x) a cos(«tt) 
p 2 x 2a + p 2 {l — x) 2a + 2ppx a (l — x) a cos(o;7r) 

Then one can check that Lu G L 2 (E; H a ,o,v ) for any u G T and 

£(u,v) = -[ (Lu)vdU a0uo , u,veF. 
Jo 

Therefore (£,J r ) is closable on L 2 (E;U a>0jl/0 ) by ([13, Proposition 3.3]). 

We now consider the case that 9 > 0. To this end, we need to use a recent result of 
James et al. By [10, Example 5.1] (cf. also [10, Theorems 3.1 and 5.3]), we have that 

rx 

Tl a ,8,vo(dx) = q a ,e(x)dx, q a ,e(x) = 9 (x-t) _1 A aj6+1 (t)dt. 

J 

x / » \ n/ a -i(t) sin(p a , (t)) - ( a -i(t) cos(p a , (t)) 

a ' e+l[) AQ{t) + ii{t)T +a)/2a 

7d(t) = cos(dir)t d p + (1 - t) d p, Cd(f) = sin(rfvr)t d p, d > -1 

Pa,*(*) = -arctan^ + — l r „(t), T a = {t G R + : 7a (t) < 0}. 
a 7 Q (t) a 

When 9 > 1, the expression above can be rewritten as 

q a ,e{x) = (9-1) l\x - t) e ~ 2 A a , e {t)dt 

J 

with 



Here 

with 
and 



A Q , e (t) 



7r{p 2 t 2a +p 2 {l- t) 2a + 2pp cos(a7r)t a (l - t) a } e / 2a ' 

where T a = if a G (0, 1/2], whereas T a = (0, u a /(l + u a )) with v a = (-p/(pcos(air))) 1/a if 
a G (1/2,1). 
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Define 



1 a 1 

Lu{x) = -x(l - x)u{x) + -u'(x)[(l - 2x) + x(l - x)q' afi {x) / q afi {x)\. 

Then one can check that Lu G L 2 (E; H a ,e,u ) for any u G T and 

£(u,v) = - [ (Lu)vdU. afi , VQ , u,v G T. 

J 

Therefore (£,J r ) is closable on L 2 (E; IL^^). The proof is complete. 



□ 



From Theorem 3.2, one can see that even for the one-dimension case, the generator of 
the labeled two parameter infinitely-many-neutral-alleles diffusion model is very complicated. 
This indicates an essential difference between the unlabeled model and the labeled model. 
More importantly, it explains why it is so difficult to construct the labeled two parameter 
infinitely-many-neutral-alleles diffusion model only using the ordinary methods that are suc- 
cessful for the one parameter case. So far we have not been able to solve the closability 
problem for the general case. In what follows, we will give further results on the blinear 
form {£,J r ) and hope they can shed some light on the problem. 

Set 

g := {GQjl) = <?«/i, fi), ■ ■ ■ , (f k , iM)),g G C 6 °°(R k ), A, . . . , f k G C b (S)}. 

Let / G C b (S) satisfying uo(f) = 0. We introduce the linear functional Bf : Q — > R defined 
by 

oo . / oo \ 

B f {G) = Y: / G f&)dP, Geg. 

s=l J \i=l / 



(3.2) 



Note that (3.2) is well-defined since for < a < 1 and G(/i) = g((fi, /i), • • • , {fk, fj)) G g, we 
have the following estimate: 



< 
< 



i^-/i||oo + --- + \\9k9-fi 



k oo 



p s dP 



,1/a 



by [18, (50)], where c > is a constant which is independent of s. 
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Proposition 3.3 Let f G Cj,(S). Then, for any v(fj) = (gi, fi) ■ ■ ■ (gi, /x) with g±, . . . , gi G 
Cb(S), we have that 

£((f,fi),v) = 7,J E (f ~ Mf),v) ■ vU aAuo (dn) + ^B f _ Mf) (v). 

Proof Let / G Cf,(S) and v(fi) = (gi,/j) • • • {gujj) with gi, . . . ,g t G Cb(S). Without loss 
of generality we assume that v Q (f) = 0. Then 

If 1 If 1 

i=i j=i 

Of ' I 1 /■ ' 

j=i [ i=i j^j 

2~ Je^' ^ n ^) n a,»,*b(^) f • ( 3 - 3 ) 



Set 

For I G N, let (5 — (A, fl 2 , ■ ■ ■ , AO be an unordered partition of the set {1,2,..., /}. We 
associate each u> with /3 W boxes, 1 < w < n. Assign the integers 1,2, ... ,1 to the I boxes, 
each box containing exactly one integer. We denote such an arrangement by A. Two 
arrangements are said to be the same if they have the same partition (5 = /3 2 , • • • , AO 
and each w, 1 < w < n, is assigned the same (unordered) set of integers. Define a map 
t : {1,2,...,/} — > {1,2, ... ,n} by r(j) = w if j is assigned to w. Then, we introduce a 
linear functional C/ : 7i — > R defined by 

C/«<?y» 

v / (-!)(-! - i) ■ ■ ■ (-f - - l)) ns =1 (-a)(i - g) ■ • ■ (A, - i - a) rs4 , 

disttc^V ^+l)..-(^ + /-l) l ' J 

• ^ n • • • , Zr(o) S f( x s) u o (dxi x • • • x cfe n )^ , 

where the value of the right hand side is obtained by continuity when a = or 6 = 0. By 
(3.3), Pitman's sampling formula, and comparing the arrangements for sizes I and / + 1, we 
find that 

£{{f,n),v) = -J E {f,n)-vTi a ,e, V0 {dn) + |c>(u). 
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Let g E Cb(S l ). Then by (3.4), the assumption that v Q {f) = and the dominated 
convergence theorem, we get 

= Epl E PwPi 2 ---Pi i #(&i,&2, •••,&,) E /(6 

(^distinct (ii ,?2 \ distinct s6{ii,i2, .■■>*;} 

distinct (i\ •••,«;) V s=l / 

= E / E (pupa ■■■pn\ 9%»ii*, • • • , ^ p 

s=l distinct (11,12, — 

= E/ 

= B f ({g,ii% (3.5) 

where and P p denote the marginal distributions of P with respect to £ and p, respectively. 
The proof is complete. □ 

Remark 3.4 If one can show that the linear functional B f - UQ (f) defined by (3.2) is bounded, 
then there exists a unique bf e L 2 (E; H a ,o,v ) su ch that 

Bf-Mf)(G)= I b f ■ GdU aA , , VGeG. 

J E) 

We define 



W,-» = ~(/,->-f6/- 



Then 



S((f,-),v) = - f (i((/,-»)^n a ^ 0) VnG.F. 
In general, we define the operator L : T ^> L 2 (E; H a ,e,u ) by induction as follows. 

Uifir)) = L (n</i,-> )-{fkr) + L({f kr ))-ll(fir) 

i=l / \i=l ) i=l 

+(v (n</i,->),v(/ fc ,.>y 



T/ien one can c/iecA; i/iai 



£(u,v) — — (Lu)vdll a o „ , Vn,nGJF. 
Je 
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Therefore, is closable on L 2 (E;U aj g tUo ) by ([13, Proposition 3.3]). 

If (S,^) is indeed not closable for the general case, we may consider its relaxation. 
We refer the reader to [14] for the definition, existence and uniqueness of relaxation. The 
relaxation of(£,J-) is a Dirichlet form, whose associated Markov process is a good candidate 
for the labeled two parameter infinitely-many-neutral- alleles diffusion model. 

Theorem 3.5 Let S be a locally compact, separable metric space, E = A4i(S) and u G 
A4i(S). Suppose that a = —k and 9 = ran for some k > and m G {2, 3, . . .}. We denote 
by H a ,e,u the finite Poisson-Dirichlet distribution. Then the symmetric bilinear form (3.1) 
{£,F) is closable on L 2 (E;U a ^ M) ) . Moreover, its closure (£,D(£)) is a quasi-regular local 
Dirichlet form, which is associated with a diffusion process on E. 

Proof By independence of the random variables {£ s , s = 1,2,...}, we find that 



Then the linear functional Bf_ vo ^ defined by (3.2) is bounded. Therefore we conclude by 
Remark 3.4 that (£, J 7 ) is closable on L 2 (E; Il Q) 6i^ ). Following the argument of ([21, Lemma 
7.5 and Proposition 5.11]), we can further show that the closure (£,D(£)) of (S,^) is a 
quasi-regular local Dirichlet form, which is thus associated with a diffusion process on E. 
The proof is complete. □ 

Finally, we present an auxiliary result (cf. Proposition 3.6 below). This result indicates 
some difficulty of showing the boundedness of the linear functional Cf defined in (3.4). 
Note that the relation between Cf and Bf is described by (3.5). In order to establish the 
boundedness of Bf and consequently the closability of (£,.F), a better understanding of the 
two parameter Poisson-Dirichlet distributions seems to be needed. 

Let A be a partition, i.e., a sequence of the form 



where Aj G N. Denote |A| := Ai + • • • + A^a). We identify partitions with Young diagrams. 
For k G N, we denote by [A : k] the number of rows in A of length k. For n G N, we set (cf. 




G (/ " Mf))(ts)dP = 0, VG G Q and s > m. 



A = (Ai,A : 



■2, 



. . . , Ai( A ), 0, 0, . . .), Ai > A 2 > • • • > A J( A) > 0, 



[16, Page 5]) 



M n (A) 



n^JA^-n^A;! 



a 



e 



(/(A)-l))n!L A 1 ) (-«)(l-«)-.-(A i -l-a) 



0(0 + l)---(0 + ra-l) 



21 



Proposition 3.6 Let < a < 1 and 9 > —a. Then 

]T M n (X)l(X) = 0(n a ). (3.6) 

A:|A|=n 

Proof We fix an n G N. Let u(fi) = v(fj) = (1, [/)••• {1, [/,) (n-fold products), 

f = 1 and ^! = • • • = g n = 1. 5y considering (3.3) and (3.4), we get 



= £(u,v) 

9 I 1 f n — 

„ n 
J£ .7 = 1 



Z Z A:IAI=n Z J i=l 



2 

A:|A|=n 

Thus, to prove the desired inequality (3.6), we only need to show that 

sup In 1- " f Y>;(1 - pi^U^g^d/j,)] < oo. 

neN [ J i=1 J 

5y /IS, we aet 

. oo r l 

J i=1 Jo 

= Ci(a,0)n 1_a -Beta(l -a,a + # + n) 
« Ci(a, ^n 1 -" . r(l - a)(l + + n)- (1_Q) 
< C 2 (M), 

where C±(a, 9) > and 6*2(0;, 0) > are constants depending only on a and 9. The proof is 
complete. □ 
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